Skip to content

Consolidate Dataclass data update methods - use DIB for update only#7483

Open
XingY wants to merge 11 commits intodevelopfrom
fb_sourceDIB
Open

Consolidate Dataclass data update methods - use DIB for update only#7483
XingY wants to merge 11 commits intodevelopfrom
fb_sourceDIB

Conversation

@XingY
Copy link
Contributor

@XingY XingY commented Mar 10, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates DataClass and Sample update behavior around DataIteratorBuilder (DIB)-based updates, removing LSID as an update/merge key and cleaning up deprecated DataClass storage by dropping the provisioned lsid column via a module upgrade.

Changes:

  • Remove LSID-as-key support for sample/data update & merge; enforce RowId or Name-based keys and ignore LSID when provided alongside valid keys.
  • Introduce shared updateRowsUsingPartitionedDIB() logic in AbstractQueryUpdateService and route Sample/DataClass updates through it.
  • Remove provisioned DataClass lsid column (domain kind + upgrade code + DB scripts) and update integration tests accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
experiment/src/org/labkey/experiment/samples/AbstractExpFolderImporter.java Stops setting the removed UseLsidForUpdate option during TSV import.
experiment/src/org/labkey/experiment/api/SampleTypeUpdateServiceDI.java Routes updates through partitioned DIB; removes LSID-key paths and legacy update methods.
experiment/src/org/labkey/experiment/api/ExpDataImpl.java Selects DataClass provisioned properties by RowId (not LSID) to support dropped provisioned LSID column.
experiment/src/org/labkey/experiment/api/ExpDataClassDataTableImpl.java Enforces RowId/Name keys, blocks LSID-only keying, blocks RowId in merge (behind feature), and uses partitioned DIB updates.
experiment/src/org/labkey/experiment/api/DataClassDomainKind.java Removes lsid from provisioned DataClass base properties, indexes, and FKs.
experiment/src/org/labkey/experiment/ExperimentUpgradeCode.java Adds deferred upgrade to drop provisioned DataClass lsid column (and related indices).
experiment/src/org/labkey/experiment/ExperimentModule.java Bumps schema version to 26.005 and centralizes experimental flag constant usage.
experiment/src/org/labkey/experiment/ExpDataIterators.java Updates derivation/update logic to use RowId/Name instead of LSID-for-update config; removes LSID update config usage.
experiment/src/client/test/integration/DataClassCrud.ispec.ts Expands integration coverage for keying behavior (RowId/Name/LSID) and partitioned-duplicate detection.
experiment/resources/schemas/dbscripts/sqlserver/exp-26.004-26.005.sql Executes Java upgrade to drop provisioned DataClass lsid column (SQL Server).
experiment/resources/schemas/dbscripts/postgresql/exp-26.004-26.005.sql Executes Java upgrade to drop provisioned DataClass lsid column (PostgreSQL).
api/src/org/labkey/api/query/DefaultQueryUpdateService.java Removes conversion-table hook and always uses getDbTable() for convertTypes() entrypoint.
api/src/org/labkey/api/query/AbstractQueryUpdateService.java Adds updateRowsUsingPartitionedDIB() and small null-iterator safety in _pump().
api/src/org/labkey/api/exp/query/ExpDataTable.java Reorders enum constants (no functional behavior change expected).
api/src/org/labkey/api/exp/api/ExperimentService.java Removes UseLsidForUpdate query option; adds shared experimental feature constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1085 to +1087
// using update from file, verify update using rowId for data that doesn't exist on this datacalss should fail.
errorResp = await ExperimentCRUDUtils.importData(server, "RowId\tDescription\n" + row3RowId + "\tupdate\n", emptyDataClass, "UPDATE", topFolderOptions, editorUserOptions);
expect(errorResp.text).toContain('Data not found for [' + row3RowId + ']');
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "datacalss" should be "dataclass".

Copilot uses AI. Check for mistakes.
Comment on lines +465 to +466
// update date twice specifying the rowId across multiple partitions
await server.post('query', 'updateRows', {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "update date" should be "update data".

Copilot uses AI. Check for mistakes.
Comment on lines +286 to 287
OptionalFeatureService.get().addExperimentalFeatureFlag(ExperimentService.EXPERIMENTAL_FEATURE_ALLOW_ROW_ID_MERGE, "Allow RowId to be accepted when merging samples or dataclass data",
"If the incoming data includes a RowId column we will allow the column but ignore it's values.", false);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in this feature description: "ignore it's values" should be "ignore its values".

Copilot uses AI. Check for mistakes.
di = new SampleUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);
else
di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);
di = new DataUpdateDerivationDataIterator(di, context, _container, _user, _currentDataType, _checkRequiredParents);//
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing "//" at the end of this line; it looks like leftover debugging text and adds noise to the code.

Copilot uses AI. Check for mistakes.
if (hasNext)
{
String lsid = (String) get(_lsidCol);
String lsid = (String) get(_lsidCol); // why lsid?, insert or merge
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the inline comment "// why lsid?, insert or merge". It reads like a note-to-self and doesn't clarify behavior for future readers.

Copilot uses AI. Check for mistakes.
Comment on lines 1299 to +1300
for (Map<String, Object> dataRow : dataRows.values())
lsids.add((String) dataRow.get("lsid"));
lsids.add((String) dataRow.get("lsid")); // ?
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the "// ?" comment here. If there's an open question about whether LSIDs are always present in these rows, it should be resolved in code (e.g., null-check + clear error) rather than left as an inline question.

Copilot uses AI. Check for mistakes.
Comment on lines +446 to 447
// update date twice specifying the name across multiple partitions
await server.post('query', 'updateRows', {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "update date" should be "update data".

Copilot uses AI. Check for mistakes.
@labkey-adam
Copy link
Contributor

@XingY there's an exp-26.004-26.005.sql script in 26.3 that should be merged to develop shortly.

@XingY
Copy link
Contributor Author

XingY commented Mar 10, 2026

@XingY there's an exp-26.004-26.005.sql script in 26.3 that should be merged to develop shortly.

OK. Thanks for the heads-up.

@XingY
Copy link
Contributor Author

XingY commented Mar 23, 2026

Finding #1 — [Severity: Critical] — SQL Ambiguity — experiment/src/org/labkey/experiment/api/ExpDataClassType.java:87
Issue: After joining the provisioned table (ds) with exp.data (d), the unqualified column reference Name in the SQL SELECT is ambiguous. The provisioned table still has a name column (from BASE_PROPERTIES), and exp.data also has a
Name column. PostgreSQL and SQL Server will both raise an ambiguous column error at runtime.

The generated SQL is effectively:
SELECT ... concat(?, ':', Name) AS Description
FROM expdataclass.storageName ds
JOIN exp.data d ON d.rowId = ds.rowid
WHERE ...
Both ds.name and d.Name exist. On PostgreSQL, unquoted Name normalizes to name, matching both.

Why it matters: Any DataClass with attachment-type properties will cause a SQL error when the attachment parent lookup runs (e.g., during attachment integrity checks, admin pages, or file management), breaking the feature
entirely.
Suggestion: Qualify the column reference: new SQLFragment("ds.Name") or new SQLFragment("d.Name").


Finding #2 — [Severity: High] — Behavioral side effect — api/src/org/labkey/api/dataiterator/StatementDataIterator.java:539-544
Issue: The change to get(int i) now falls through to _data.get(i) when a key column's value is null:
if (null != _keyColumnInfo.get(i))
{
Object value = _keyValues.get(i);
if (value != null)
return value;
}
return _data.get(i);
Previously, if _keyColumnInfo.get(i) was non-null (i.e., it IS a key column), the key value was always returned — even if null. Now, a null key value causes fallthrough to the data source. This change is global to
StatementDataIterator and affects all code paths using it, not just DataClass updates. Any UPDATE/INSERT where a key column legitimately has a null generated value would now silently return the data-source value instead of null,
potentially masking real issues or changing behavior.

Why it matters: This is a shared, low-level data iterator class. A behavioral change here could subtly affect SampleType updates, List updates, or any other table that goes through StatementDataIterator. Bugs would manifest as
incorrect data being written.
Suggestion: Consider scoping this change more narrowly — perhaps only fall through for the LSID column specifically, or add a flag that DataClassUpdateAddColumnsDataIterator sets to opt-in to this behavior. At minimum, add a
comment explaining WHY this change is needed and what invariant it relies on.


Finding #3 — [Severity: Medium] — Incorrect comment — experiment/src/org/labkey/experiment/ExperimentUpgradeCode.java:1002
Issue: The Javadoc says Called from exp-26.004-26.005.sql but the actual SQL scripts are exp-26.005-26.006.sql.
Why it matters: Misleading comments make future debugging and maintenance harder.
Suggestion: Change to Called from exp-26.005-26.006.sql.


Finding #4 — [Severity: Medium] — Wrong import / semantic mismatch — experiment/src/org/labkey/experiment/api/ExpDataClassDataTableImpl.java:1031,1045
Issue: The file imports LSID from ExpMaterialTable.Column (the Sample/Material enum) and uses it in DataClass context:
import static org.labkey.api.exp.query.ExpMaterialTable.Column.LSID;
...
keysCheck.contains(LSID.name())
...
new ValidationException("RowId is not accepted when merging data...", ExpMaterialTable.Column.RowId.name())
While .name() returns "LSID"/"RowId" for both enums so it's functionally correct, using the Material enum in a DataClass context is semantically confusing and could mislead future readers.
Why it matters: Maintainability — a future developer may think this is intentionally referencing Material behavior.
Suggestion: Import from ExpDataTable.Column instead.


Finding #5 — [Severity: Medium] — Debug artifacts left in code — multiple files
Issue: Several WIP/debug comments and artifacts remain:

  • ExpDataIterators.java:912: di = new DataUpdateDerivationDataIterator(...);// — trailing // comment
  • ExpDataIterators.java:1183: String lsid = (String) get(_lsidCol); // why lsid?, insert or merge — uncertainty comment
  • ExpDataClassDataTableImpl.java:1318: lsids.add((String) dataRow.get("lsid")); // ? — question mark comment

Why it matters: These look like personal notes/TODOs that should be resolved before merge.
Suggestion: Resolve the questions these comments raise, then remove or rewrite them as proper documentation.


Finding #6 — [Severity: Medium] — Duplicate detection only catches first duplicate — api/src/org/labkey/api/query/AbstractQueryUpdateService.java:985-1003
Issue: checkPartitionForDuplicates returns immediately after finding the first duplicate:
if (rowId != null && !globalRowIds.add(rowId))
{
errors.addRowError(new ValidationException("Duplicate key provided: " + rowId));
return;
}
This means if there are multiple duplicate keys, only the first is reported. The user must fix one, re-submit, find the next, etc.
Why it matters: Poor UX for bulk operations. Users may need multiple round-trips to fix all duplicates.
Suggestion: Continue checking and report all duplicates in one pass (remove the early return), or accept this as intentional behavior and add a comment explaining why.


Finding #7 — [Severity: Low] — Null configParameters passed to recordDataIteratorUsed — experiment/src/org/labkey/experiment/api/ExpDataClassDataTableImpl.java:1451
Issue:
Map<Enum, Object> finalConfigParameters = configParameters == null ? new HashMap<>() : configParameters;
recordDataIteratorUsed(configParameters); // original null value passed, not finalConfigParameters
If configParameters is null, the original null is passed to recordDataIteratorUsed rather than finalConfigParameters.
Why it matters: If recordDataIteratorUsed tries to read or modify the map, it could NPE. (This is pre-existing in the SampleType code, but worth noting as it was copied.)
Suggestion: Pass finalConfigParameters instead.


Finding #8 — [Severity: Low] — _tableInfo field never used — api/src/org/labkey/api/dataiterator/DataClassUpdateAddColumnsDataIterator.java:38
Issue: The field private final TableInfo _tableInfo is assigned in the constructor but never read anywhere in the class.
Why it matters: Dead code.
Suggestion: Remove the field if it's not needed.


Phase 4: Security

No security issues found. The changes maintain existing authorization patterns — updates go through the standard QueryUpdateService pipeline which checks permissions. The SimpleFilter in
DataClassUpdateAddColumnsDataIterator.prefetchExisting() properly scopes queries by ClassId and Container, preventing cross-container or cross-DataClass data access. No user input is directly concatenated into SQL.

Phase 5: Interactions and Side Effects

  • StatementDataIterator change (Finding Refactor WebDAV URL generation to make it easier to use in other code… #2) is the highest-risk interaction. This class is used broadly across the platform, and the null-key fallthrough behavior change could affect SampleTypes, Lists, or other update paths.
  • The SampleTypeUpdateServiceDI refactoring correctly delegates partitioned update logic to the parent class and overrides validatePartitionedRowKeys to call confirmAmountAndUnitsColumns. The deleted coerceTypes method was marked
    @deprecated, so its removal is expected.
  • The updateRow() and _update() methods in ExpDataClassDataTableImpl.DataClassDataUpdateService now throw UnsupportedOperationException. Any caller that reaches these through the non-DIB path (e.g., via super.updateRows() with
    oldKeys provided) would crash. The test coverage should verify this doesn't happen, but there could be callers outside the experiment module.
  • Tests are comprehensive for the new DataClass behavior — they cover LSID rejection, RowId/Name updates, partitioned updates, cross-DataClass validation, and merge restrictions. However, there don't appear to be tests
    specifically for the StatementDataIterator change's impact on other entity types.

Overall Assessment

This is a well-structured consolidation that brings DataClass updates in line with the established SampleType DIB pattern and properly eliminates the LSID provisioned column. The most critical issue is the ambiguous Name column in
ExpDataClassType.java (Finding #1), which will cause a SQL error at runtime for any DataClass with attachment properties — this must be fixed before merge. The StatementDataIterator behavioral change (Finding #2) is the second
most concerning item since it modifies shared infrastructure with potentially broad impact; it deserves extra scrutiny and ideally a more targeted approach. The remaining findings are medium/low severity cleanup items that would
improve code quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants